Access to Data, Project Files, and Github Repository
Use this link to create a local directory to our github repository named BST260-Final-Team8: https://github.com/marissachan/BST260-Final-Team8.git
Overview and Motivation
According to the United Nations, a person is forced to leave home every two seconds. In 2020, 3.5% of the world’s population were international migrants IOM 2020.
Due to ongoing conflicts, violence, and climate change, the number of individuals migrating to other countries has grown substantially in recent years. Major displacement and migration events that have driven this increase including economic and political instability in Venezuela, climate hazards in the Philippines and China, and violence in Bangladesh IOM 2020. Unfortunately, there are numerous other events that drive the number of migrants and the patterns of migration that we are now witnessing. These complex and changing migration patterns make it difficult to track migration routes and the tragedies that occur on the journeys to their destination countries.
The migration crisis demands public policy to improve asylum processes and humanitarian setting relief to address the root causes and impacts of migration (e.g. climate change, conflict, violence, etc.) using a Human Rights framework OHCHR.
BBC 2021
Project Objectives (Initial Questions)
Our main objective was to analyze the main causes of deaths for migrants, spatially and temporally. We were specifically interested in answering the questions: (1) What is the spatial distribution of migrant deaths (more specifically, causes of death)? How does it differ by year? (Mapping) (2) What are the most frequent causes of death among migrants? How does it differ by year? (Text Analysis) (3) What model is appropriate for examining predicted migrant deaths from violence? (Machine Learning) (4) What are the migration routes that result in violence as the most frequent cause of death? (Decision Tree)
Screencast
Click here for the screencast overview of project
Source and Methodology
The Missing Migrants Project data is used to inform target 10.7.3 of the 2030 Agenda for Sustainable Development, the “number of people who died or disappeared in the process of migration towards an international destination” calling on all the world’s governments to address what the International Organization for Migration (IOM) describes as “an epidemic of crime and abuse.” Missing Migrants Project 2020. Using a total of 1948 information sources across the globe, IOM’s Missing Migrants Project tracks deaths and disappearances of migrants, including refugees and asylum-seekers, who have gone missing along mixed migration routes worldwide. More specifically, the project focuses on migrants who have died at external borders of states or in the process of migration towards an international destination regardless of legal status, excluding internally displaced persons, one of the largest proportion of migrants.
We conducted a variety of exploratory and main analyses (mapping, text analysis, machine learning, and decision tree) to answer our questions listed above and to elucidate the experiences and tragedies of migrants across the world. This project aims to contribute to research examining the patterns of migration in efforts to protect migrant health and safety.
Atlantic 2015
Exploratory Analysis
Waves of Migration
Here, we see that 2016 had the highest number of deaths and disappearances due to global migration.
From Waves to a Constant Flow of Migration
Changes in patterns of migration over time
From 2014 to 2020, earlier events were less frequent with higher numbers of deaths and disappearances while later incidents were more frequent but with less deaths and disappearances. This may be due to the drastic increase of forced displacement from a limited number of countries ; this surge was concentrated between 2012 to 2017 with 67% of refugees coming from 5 countries: the Syrian Arab Republic (6.7 mil), Afghanistan (2.7 mil), South Sudan (2.3 mil) , Myanmar (1.1 mil) , and Somalia (0.9 mil) UNHCR. While in 2019, the conflict, violence and disasters triggered 33.4 million new internal displacements across 145 countries. Thus, the trends in global migration at large seem to echo the conflicts over time.
Here, we see that after 2016, the migration trends change with years 2014 through 2016 having very similar trend of large fluctuations with much larger scale incidents of deaths and disappearances while 2018 through 2020 having very similar trend of more consistent and constant migration.
Understanding causes of death in migration incidents over time
Using this interactive graph, we can examine the cause of death and number dead by incident by region.
Looking at migration incidents by continent
This allows us to see where the most incidents occur across continents. We have to still consider that we do not know where the migrants start point was.
Main analyses (1-4)
(1) Examining Causes of Deaths by Region
Using the interactive map below, we can examine the causes of death and number of incidents reported by region. The number of incidents increased every year until 2019, and then decreased in 2020 and 2021. The user can interact with the map by choosing the year of interest using the toggle on the right and clicking on the circle markers to zoom into the causes of death reported for each location of incident.
To view the map on a separate window, please click here
(2) Examining Causes of Death using Text Analysis
The table below presents the 22 main words identified from the causes of deaths reported from 2014-2021. The words with the highest frequency were “unknown”, “mixed”, and “drowning”. Additionally, “lack” and “adequate” were mentioned frequently. The words that had the lowest frequency were “accidental”, “access”, “healthcare”, and “sickness”.
One significant limitation of this analysis is the standardized causes of death reported in the Missing Migrants dataset (Note: most of this dataset’s string variables are standardized). In other words, there are only a set number of causes of death reported, which means that the most frequently reported words will be related. For example: “mixed” and “unknown” are both a part of the cause of death listed “Mixed or Unknown”. While this does limit our abilities to draw in-depth conclusions, we are still able to identify the main causes of death reported from 2014-2021.
While standardizing the causes of death does make it easier for researchers, governments, and other entities examining this issue, we could potentially be losing some information regarding the specifics to the causes of death. Potentially future versions of the dataset will include not only the current causes of death variable, but also an additional string variable with more information regarding the specific causes of death that may be more conducive to text analyses.
| Overall (N=29755) |
|
|---|---|
| Main Words Identified from Causes of Death | |
| access | 818 (2.7%) |
| accident | 1336 (4.5%) |
| accidental | 306 (1.0%) |
| adequate | 1832 (6.2%) |
| conditions | 1014 (3.4%) |
| death | 1642 (5.5%) |
| drowning | 2538 (8.5%) |
| environmental | 1014 (3.4%) |
| food | 1014 (3.4%) |
| harsh | 1014 (3.4%) |
| hazardous | 1336 (4.5%) |
| healthcare | 818 (2.7%) |
| lack | 1832 (6.2%) |
| linked | 1336 (4.5%) |
| mixed | 2730 (9.2%) |
| shelter | 1014 (3.4%) |
| sickness | 818 (2.7%) |
| transport | 1336 (4.5%) |
| unknown | 2730 (9.2%) |
| vehicle | 1336 (4.5%) |
| violence | 927 (3.1%) |
| water | 1014 (3.4%) |
We also wanted to visually present the main words identified. So, we created a word cloud to present the main words from causes of death (specifically words that were listed a minimum of 1300 times to demonstrate those listed the most frequently). This word cloud supported our previous findings that the main causes of death were mixed or unknown and drowning.
Lastly, we created a bar plot of the top 10 most frequently reported words, illustrating the main causes of death. Again, “mixed” and “unknown” and “drowning” were the most frequent.
As a secondary text analysis, we examined the main words identified from the causes of death to see if there were differences across years. In 2014, 2016-2019 the main words identified (representing the main causes of death among migrants that year) were “mixed” and “unknown”. One of the other most frequent words in those years were “drowning”. In 2015, 2020-2021, the main word identified from causes of death was “drowning”. Our finding of “mixed” and “unknown” being the most frequent words, demonstrates that causes of death among migrants are multifactorial and most likely difficult to document due to the circumstances of their deaths and disappearances. Additionally, it means that we are somewhat limited in terms of identifying interventions based on causes of deaths from this current analysis.
Comparatively, the second most frequent word identified (representing the second most frequent cause of death across years) and the most frequent since 2020 is drowning. This demonstrates that certain migration routes across water are more deadly compared to other routes. This finding is supported by ongoing news stories tragedies involving drownings (one of the most recent in November 2021 Noack et al., as well as speaks to the rationale for the development of the Missing Migrants Project when two shipwrecks occurred near the Italian island of Lampedusa and more than 368 individuals lost their lives.
To further explore the causes of death, we conducted additional analyses regarding the cause of death of violence.
(3) Violence by Region: Machine Learning
Create Region Category
A regional category was created to allow for analysis in the machine learning algorithms. The values for each of the regional categories is shown below.
1 = Caribbean 2 = Central America 3 = Central Asia 4 = Eastern Africa 5 = Eastern Asia 6 = Europe 7 = Mediterranean 8 = Middle Africa 9 = North America 10 = Northern Africa 11 = South America 12 = South-eastern Asia 13 = Southern Africa 14 = Southern Asia 15 = Western Africa 16 = Western Asia
## [1] 13537 27
## [1] 5801 27
Based on the confusion matrices all of the models have high accuracies of 0.90 with very high specificity and very low sensitivity. This is likely due to class imbalance with relatively few cases of death due to violence in comparison to the overall large data set. The summary statistics for the logistic regression model show that only region_category is a significant predictor of death due to violence, the month appears to have no significant impact on violence as an outcomes, this is interesting as it shows that month of migration does not appear to impact the likelihood of dying from violence, but this metric is determined solely based on region.
The high accuracies of the models stem from the class imbalance and occur since by assuming all machine learning outcomes result in an outcome of 0 (death due to a cause other than violence) the algorithm is correct nearly 90% of the time, hence the high specificity values.
##
## Call:
## glm(formula = violence ~ region_category + month, family = "binomial",
## data = train_set)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6479 -0.4834 -0.4264 -0.3776 2.4896
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.903797 0.132442 -21.925 <2e-16 ***
## region_category 0.082176 0.007816 10.514 <2e-16 ***
## monthAugust -0.171684 0.144219 -1.190 0.234
## monthDecember -0.189279 0.153686 -1.232 0.218
## monthFebruary 0.052058 0.150553 0.346 0.730
## monthJanuary 0.076035 0.143987 0.528 0.597
## monthJuly -0.207263 0.145592 -1.424 0.155
## monthJune -0.187702 0.144470 -1.299 0.194
## monthMarch 0.105565 0.147368 0.716 0.474
## monthMay 0.134399 0.146859 0.915 0.360
## monthNovember -0.238199 0.153090 -1.556 0.120
## monthOctober -0.284773 0.145951 -1.951 0.051 .
## monthSeptember -0.231409 0.143518 -1.612 0.107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8553.9 on 13536 degrees of freedom
## Residual deviance: 8417.9 on 13524 degrees of freedom
## AIC: 8443.9
##
## Number of Fisher Scoring iterations: 5
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 5245 556
## 1 0 0
##
## Accuracy : 0.9042
## 95% CI : (0.8963, 0.9116)
## No Information Rate : 0.9042
## P-Value [Acc > NIR] : 0.5113
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.00000
## Specificity : 1.00000
## Pos Pred Value : NaN
## Neg Pred Value : 0.90415
## Prevalence : 0.09585
## Detection Rate : 0.00000
## Detection Prevalence : 0.00000
## Balanced Accuracy : 0.50000
##
## 'Positive' Class : 1
##
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 5245 556
## 1 0 0
##
## Accuracy : 0.9042
## 95% CI : (0.8963, 0.9116)
## No Information Rate : 0.9042
## P-Value [Acc > NIR] : 0.5113
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.00000
## Specificity : 1.00000
## Pos Pred Value : NaN
## Neg Pred Value : 0.90415
## Prevalence : 0.09585
## Detection Rate : 0.00000
## Detection Prevalence : 0.00000
## Balanced Accuracy : 0.50000
##
## 'Positive' Class : 1
##
## Confusion Matrix and Statistics
##
## truth
## pred 0 1
## 0 5213 515
## 1 32 41
##
## Accuracy : 0.9057
## 95% CI : (0.8979, 0.9131)
## No Information Rate : 0.9042
## P-Value [Acc > NIR] : 0.3542
##
## Kappa : 0.1106
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.073741
## Specificity : 0.993899
## Pos Pred Value : 0.561644
## Neg Pred Value : 0.910091
## Prevalence : 0.095846
## Detection Rate : 0.007068
## Detection Prevalence : 0.012584
## Balanced Accuracy : 0.533820
##
## 'Positive' Class : 1
##
The KNN model has the highest AUC with a value of 0.7461.
## Area under the curve: 0.6313
## Area under the curve: 0.6808
## Area under the curve: 0.7461
(4) Violence by Region: Decision Tree
The decision tree takes into account region and month of migration to determine the likelihood of death by violence. The decision tree shows that 45% of deaths by violence occur in region categories below or equal to 6. These 6 regions include Caribbean, Central America, Central Asia, Eastern Africa, Eastern Asia, and Europe.
Overall the decision tree matches what was depicted in the machine learning models, where regional category is the sole predictor of death due to violence and month of migration does not appear to have a significant impacts on the outcome of death via violence.
## Call:
## rpart(formula = violence ~ region_category + month, data = mm,
## subset = train)
## n= 9669
##
## CP nsplit rel error xerror xstd
## 1 0.04490980 0 1.0000000 1.0001633 0.02763728
## 2 0.01595402 1 0.9550902 0.9558953 0.02667908
## 3 0.01000000 3 0.9231822 0.9246948 0.02489651
##
## Variable importance
## region_category
## 100
##
## Node number 1: 9669 observations, complexity param=0.0449098
## mean=0.09732134, MSE=0.08784989
## left son=2 (9352 obs) right son=3 (317 obs)
## Primary splits:
## region_category < 15.5 to the left, improve=0.044909800, (0 missing)
## month splits as RLLRLLRLRLLL, improve=0.001438313, (0 missing)
##
## Node number 2: 9352 observations, complexity param=0.01595402
## mean=0.08575706, MSE=0.07840278
## left son=4 (5817 obs) right son=5 (3535 obs)
## Primary splits:
## region_category < 9.5 to the left, improve=0.017267370, (0 missing)
## month splits as RLLRRLRRRRLL, improve=0.001238113, (0 missing)
##
## Node number 3: 317 observations
## mean=0.4384858, MSE=0.246216
##
## Node number 4: 5817 observations, complexity param=0.01595402
## mean=0.05707409, MSE=0.05381664
## left son=8 (4310 obs) right son=9 (1507 obs)
## Primary splits:
## region_category < 5.5 to the right, improve=0.046134670, (0 missing)
## month splits as RLLRRLLRLRLL, improve=0.002053639, (0 missing)
##
## Node number 5: 3535 observations
## mean=0.1329562, MSE=0.1152788
##
## Node number 8: 4310 observations
## mean=0.02761021, MSE=0.02684789
##
## Node number 9: 1507 observations
## mean=0.1413404, MSE=0.1213633
Implications
People are forcibly displaced at a rate of 34,000 per day due to conflict or persecution leaving about 10 million people worldwide stateless – having been denied access to basic rights such as safety, health care, labor and freedom of movement United Nations. The IOM’s Missing Migrants Project highlights the complexity of the politics of borders and the need for a focus on a human rights framework. There is a need to identify critical periods, cumulative impact and core pathways that shape migration.
OHCHR